217 research outputs found
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
Learning Relatedness Measures for Entity Linking
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking.
The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms
On Suggesting Entities as Web Search Queries
The Web of Data is growing in popularity and dimension,
and named entity exploitation is gaining importance in many research
fields. In this paper, we explore the use of entities that can be extracted
from a query log to enhance query recommendation. In particular, we
extend a state-of-the-art recommendation algorithm to take into account
the semantic information associated with submitted queries. Our novel
method generates highly related and diversified suggestions that we as-
sess by means of a new evaluation technique. The manually annotated
dataset used for performance comparisons has been made available to
the research community to favor the repeatability of experiments
SE-PQA: Personalized Community Question Answering
Personalization in Information Retrieval is a topic studied for a long time.
Nevertheless, there is still a lack of high-quality, real-world datasets to
conduct large-scale experiments and evaluate models for personalized search.
This paper contributes to filling this gap by introducing SE-PQA (StackExchange
- Personalized Question Answering), a new curated resource to design and
evaluate personalized models related to the task of community Question
Answering (cQA). The contributed dataset includes more than 1 million queries
and 2 million answers, annotated with a rich set of features modeling the
social interactions among the users of a popular cQA platform. We describe the
characteristics of SE-PQA and detail the features associated with questions and
answers. We also provide reproducible baseline methods for the cQA task based
on the resource, including deep learning models and personalization approaches.
The results of the preliminary experiments conducted show the appropriateness
of SE-PQA to train effective cQA models; they also show that personalization
remarkably improves the effectiveness of all the methods tested. Furthermore,
we show the benefits in terms of robustness and generalization of combining
data from multiple communities for personalization purposes
Social Search: retrieving information in Online Social Platforms -- A Survey
Social Search research deals with studying methodologies exploiting social
information to better satisfy user information needs in Online Social Media
while simplifying the search effort and consequently reducing the time spent
and the computational resources utilized. Starting from previous studies, in
this work, we analyze the current state of the art of the Social Search area,
proposing a new taxonomy and highlighting current limitations and open research
directions. We divide the Social Search area into three subcategories, where
the social aspect plays a pivotal role: Social Question&Answering, Social
Content Search, and Social Collaborative Search. For each subcategory, we
present the key concepts and selected representative approaches in the
literature in greater detail. We found that, up to now, a large body of studies
model users' preferences and their relations by simply combining social
features made available by social platforms. It paves the way for significant
research to exploit more structured information about users' social profiles
and behaviors (as they can be inferred from data available on social platforms)
to optimize their information needs further
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
Deep pretrained transformer networks are effective at various ranking tasks,
such as question answering and ad-hoc document ranking. However, their
computational expenses deem them cost-prohibitive in practice. Our proposed
approach, called PreTTR (Precomputing Transformer Term Representations),
considerably reduces the query-time latency of deep transformer networks (up to
a 42x speedup on web document ranking) making these networks more practical to
use in a real-time ranking scenario. Specifically, we precompute part of the
document term representations at indexing time (without a query), and merge
them with the query representation at query time to compute the final ranking
score. Due to the large size of the token representations, we also propose an
effective approach to reduce the storage requirement by training a compression
layer to match attention scores. Our compression technique reduces the storage
required up to 95% and it can be applied without a substantial degradation in
ranking performance.Comment: Accepted at SIGIR 2020 (long
Discovering Europeana users’ search behavior
Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users
Exploring Social Media for Event Attendance
Large popular events are nowadays well reflected in social media fora (e.g. Twitter), where people discuss their interest in participating in the events. In this paper we propose to exploit the content of non-geotagged posts in social media to build machine-learned classifiers able to infer users' attendance of large events in three temporal periods: before, during and after an event. The categories of features used to train the classifier reflect four different dimensions of social media: textual, temporal, social, and multimedia content. We detail the approach followed to design the feature space and report on experiments conducted on two large music festivals in the UK, namely the VFestival and Creamfields events. Our attendance classifier attains very high accuracy with the highest result observed for the Creamfields dataset ~87% accuracy to classify users that will participate in the event
- …